Views in a Large Scale XML
نویسندگان
چکیده
DTD culture/painting −−> WorkOfArt culture/painting −−> painter/painting culture/painting/title −−> WorkOfArt/title culture/painting/title −−> painter/painting/name culture/painting/author −−> WorkOfArt/artist/name culture/painting/author −−> painter culture/painting/museum −−> WorkOfArt/gallery culture/painting/museum −−> painter/painting/location 2 concrete DTDs of the "art" cluster Some mappings for the "art" cluster painter painting year technique location name year description biography ... ... Figure 9: A View Over the Culture Domain Another bad answer to the problem is to try and bypass distribution by, e.g., encoding views so as to reduce their size as much as possible. Still, however smart you are at encoding paths, this does not scale. So, it seems that the standard translation pattern must be re-considered. We process queries in two steps. First, the query is pre-compiled on an interface machine. Using local information, we nd out which clusters of data are concerned and generate a plan whose speci city is that it contains abstract instead of concrete tree patterns. E.g., considering Query 2.1, the pattern tree of Figure 1 is preserved while the abstract collection culture is replaced by the actual clusters art, literature and culture. In a second step, the plan is distributed and evaluated. Remember that evaluation always starts on index machines. There, the abstract patterns are translated into concrete ones before they are used to query the index. Section 4 explains this in greater details. Here, we are concerned with the data structures needed by the translation process. Note that this double-steps translation has some very nice properties. First, we avoid useless broadcast. Also, the plans that are shipped are small, they do not include the many combinations of concrete patterns matching an abstract one. But more importantly, we have solved the distribution problem. Indeed, the only \global" information that needs to be maintained on the IMs is a correspondence between abstract DTDs and actual clusters. The remaining view information is naturally distributed over the concerned index machines. culture painting author title sculpture period museum . . . {culture, art} {culture, cinema, art, literature, tourism} {culture, art, tourism} {culture, art} {culture, art} {art} {culture, art, tourism} Figure 10: View on an Interface Machine To illustrate this, let us reconsider the view example introduced in the previous section and presented in Figure 9. We consider here the representation of a view in main memory. The persistent representation is a straightforward translation into XML documents. As explained above, we distribute the view among interface and index machines. On each interface machine, we nd a tree representing an annotated abstract DTD. More precisely, each node is marked with the actual clusters in which there exists matching concrete paths (see Figure 10 for an example). This structure is used at pre-processing time to understand how the 9 0 WorkOfArt -1 1 artist 0 2 name 1 3 gallery 0 4 title 0 5 painter -1 6 painting 5 7 name 6 8 year 6 9 location 6 painting museum title author 0 5 root 0 6 cpath 0 5 root 4 7 cpath 0 5 root 2 5 cpath 0 5 root 3 9 cpath culture Figure 11: View on an Index Machine query should be distributed. It is replicated because each IM must be able to translate all queries. Note that it could have been made smaller by keeping only the root of the abstract DTD. However, as it is, it allows to (i) check the abstract \typing" of queries and (ii) reduce the number of subplans (e.g., if the user is interested in titles of paintings, there is no need to generate a plan over the tourism cluster). Also we will see in Section 5 that this tree is used to support joins in view de nitions. Note that interface machines manage only abstract DTDs and their associated clusters, two items whose size is usually rather small and very much controlled. On each index machine, we nd the view information relative to its indexed clusters. Figure 11 corresponds to the representation of the mappings given in Figure 5. It consists of two parts, a table and a tree. (i) The table represents in a simple way the forest of all concrete paths that have been mapped to some abstract paths. Each node is represented by its tag and the identi er of its father (-1 when it is a root). Nodes are identi ed by their entry in the table. E.g., 2 identi es WorkOfArt/artist/name. (i) The tree maps abstract paths to concrete paths. Concrete paths are represented in the tree by two integers identifying, respectively, the concrete path itself (cpath) and the DTD root element from which it stems (root). As explained in Section 4, the second information is here to reduce the complexity of the translation process (consider that there are thousands of concrete paths but rarely more than a dozen stemming from one root element). Note that the size allocated to a view on an index machine is very small compared to the size of the index itself (usually less than a thousandth). Also, the size of a view depends on the size and heterogeneity of clusters. When a cluster grows too big, we re ne the classi cation so as to split it. This results in a re-organization of store and indexes that is performed lazily while (re-)loading. Views are reconstructed when the index re-organization is over. In the meantime, views are simply larger than they should. 3.3 Maintaining Views Let us now consider how view data structures are maintained. On each interface machine, runs a process called global process in which queries are compiled and controlled. On all index machines (as well as repository machines), we nd a process called local process in which local subplans are evaluated (see Figure 12). The view data structures are part of these processes so as to be shared by all queries. Not surprisingly, they are managed by one object called view manager that is global (on IMs) or local (on XMs). 10 When a query needs to access a view structure, it asks the corresponding view manager for a navigator (for trees) or an accessor (for tables). These are in fact simple objects pointing to the view data structures. Within both global and local processes, we nd a Corba server called view server. View servers are accessed when one wants to install or update a view. They are mono-threaded objects, i.e., we support one view update at a time. To perform an update, one sends a message to the global view server with the name of the view and a le containing the new mappings. The update is then processed as follows. Global View Server Fetch the XML document that contains the information concerning the view (notably, the names of its associated local view servers); Install in memory the tree structure connecting abstract paths to clusters (see Figure 10); Interpret the le containing the new mappings, updating the annotated tree of abstract paths and partitioning the mappings according to clusters; For each cluster concerned by the new mappings, send an update message to the corresponding local view server(s) with the appropriate mappings; If one local update goes wrong (e.g., a server is down), return failure; (Note that the persistent representation of the view is then inconsistent. Only a successful replay of the full update will make it consistent again); Otherwise, save the new information concerning the view and return success. Local View Server If it is not the rst update concerning this view and this cluster, fetch the XML document that store the previous updates and install the corresponding data structures in memory (see Figure 114); Read the new mappings, updating the in-memory representation of mappings; Save the new view information and return success. Note that, once the update has been performed, queries are still being evaluated with the old version of the view. In order to take into account the new version, one has to send an installation message to the global view server. Global View Server Fetch the XML view document; Send an install message to all concerned local view servers; If one fails, return failure (this time, it is the distributed main memory representation of the view that is inconsistent. Some local subplans will be evaluated using the old version, others will use the new. In most probability, the user will not see the di erence. Again, a successful replay of the installation will correct the situation.) Install in memory the view data structure; Ask the view manager to swap this structure with the one it is currently managing; Wait until no query uses the old structure. 4In fact, the update representation of the table of concrete paths is slightly more complex than that used by queries. Notably, it allows to navigate in all directions. 11 processor 1 Global query view server view 1 view manager ... query query processor 2 Local processor 1 query Local Local processor 3 view server ... view 2 view 1 view manager ... Global query Global query processor 2 processor 3 Global Process Local Process Figure 12: Global and Local Processes in Xyleme Delete it and return success. Local View Server Fetch the XML local view document; Install the view structures in memory; Ask the view manager to swap this structure with the one it is currently managing; Wait until no query uses the old structure. Delete it and return success. We are now ready to see how abstract queries are translated into concrete ones. 4 Query Translation As brie y explained in the previous section, the evaluation of a query against a view is performed in two steps: 1. On some interface machine, the query is parsed, and an algebraic expression is generated. This expression features operations called PatternScan that retrieve, from a collection of documents, the elements that match a given pattern (see Section 2). At this stage, both collections and patterns are abstract. The Abstract Query Translator (AQT) translates abstract collections into actual ones. Then, the query is partly optimized, an execution plan is generated and distributed. 2. Plan evaluation starts on index machines. There, a physical operator called Abstract to Concrete (A2C) translates abstract patterns into unions of concrete patterns. Each generated pattern is given to a physical algebraic operator called FTIscan that will match it e ciently against the indexed documents and return the selected elements and documents. Those will be further processed by other operators, on other machines. We are not concerned by the full query evaluation process here (see [3]), just by the part concerning the view translation. More precisely, we describe AQT and A2C operators. 12
منابع مشابه
Apply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کاملApply Uncertainty in Document-Oriented Database (MongoDB) Using F-XML
As moving to big data world where data is increasing in unstructured way with high velocity, there is a need of data-store to store this bundle amount of data. Traditionally, relational databases are used which are now not compatible to handle this large amount of data, so it is needed to move on to non-relational data-stores. In the current study, we have proposed an extension of the Mongo...
متن کاملXML View Materialization with Deferred Incremental Refresh: the Case of a Restricted Class of Views
A view mechanism can provide a user with an appropriate portion of a database through data filtering and aggregation. Views are often materialized for query performance improvement, and in that case, their consistency needs to be maintained against updates of the underlying data. They can be either recomputed or incrementally refreshed by reflecting only the relevant updates. With the emergence...
متن کاملXML Views, Part III: An UML Based Design Methodology for XML Views
Object-Oriented (OO) conceptual models have the power in describing and modelling real-world data semantics and their inter-relationships in a form that is precise and comprehensible to users. Today UML has established itself as the language of choice for modelling complex enterprises information systems (EIS) using OO techniques. Conversely, the eXtensible Markup Language (XML) is fast emergin...
متن کاملVertebral heart scale of common large breeds of dogs in Iran
In order to assess the influence of breed on the vertebral heart scale (VHS) of dogs, the VHS was measured and compared in left to right (LL) and right to left lateral (RL) views. For all dogs (n=56), the mean VHS on the RL radiographs [9.7 vertebra (v)] was significantly larger than the mean VHS on the LL radiographs (9.6 v) (p=0.047). Doberman dogs had higher mean values of the VHS in LL (9.9...
متن کاملIncremental Maintenance Of Materialized XQuery Views by Maged F . El - Sayed A Dissertation Submitted
Keeping views fresh by maintaining the consistency between materialized views and their base data in the presence of base updates is a critical problem for many applications, including data warehousing and data integration. While heavily studied for traditional databases, the maintenance of XML views remains largely unexplored. Maintaining XML views is complex due to the richness of the XML dat...
متن کامل